AOL4PS: A Large-Scale Data Set for Personalized Search

نویسندگان

چکیده

Personalized search is a promising way to improve the quality of Websearch, and it has attracted much attention from both academic industrial communities. Much current related research based on commercial engine data, which can not be released publicly for such reasons as privacy protection information security. This leads serious lack accessible public data sets in this field. The few available have become widely used academia because complexity processing process required study personalized methods. together with difficulties brought obstacles fair comparison evaluation models. In paper, we constructed large-scale set AOL4PS evaluate methods, collected processed AOL query logs. We present complete detailed construction process. Specifically, address challenges time storage space demands by massive volumes, optimized proposed an improved BM25 algorithm. Experiments are performed some classic state-of-the-art experiment results demonstrate that measure effect

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Personalized Data Set for Analysis

Data Management portfolio within an organization has seen an upsurge in initiatives for compliance, security, repurposing and storage within and outside the organization. When such initiatives are being put to practice care must be taken while granting access to data repositories for analysis and mining activities. Also, initiatives such as Master Data Management, cloud computing and self servi...

متن کامل

A Large Scale, Cross-disease Family Health History Data Set

Introduction: A family health history data set need to be evaluated before applying to the study of genetic diseases, genetic counseling, and epidemiological studies. We have obtained a large scale, cross-disease family health history data set (FhhDS) from electronic discharge summaries at Columbia Presbyterian Medical Center by using a pattern matching parser we have developed 1. Currently, Fh...

متن کامل

A large-scale crop protection bioassay data set

ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other ...

متن کامل

A Practical Desalinization Model for Large Scale Application

Salinity of soil and water is the most important agricultural hazard in arid and semi-aridregions. In saline soils, yield production directly influences by soluble salts in the root zone aswell as by shallow water table depth. The first step for reclamation of such soils is reducingsalinity to optimum level by leaching. The objective of this study was to develop a practicalmodel to estimate wat...

متن کامل

A partition-based algorithm for clustering large-scale software systems

Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data intelligence

سال: 2021

ISSN: ['2096-7004', '2641-435X']

DOI: https://doi.org/10.1162/dint_a_00104